Skip to content

[core] Add manifest partition pruning to DV validation in MergingSnapshotProducer#15653

Open
anoopj wants to merge 1 commit intoapache:mainfrom
anoopj:validation-partition-pruning
Open

[core] Add manifest partition pruning to DV validation in MergingSnapshotProducer#15653
anoopj wants to merge 1 commit intoapache:mainfrom
anoopj:validation-partition-pruning

Conversation

@anoopj
Copy link
Contributor

@anoopj anoopj commented Mar 16, 2026

validateAddedDVs() currently opens every delete manifest regardless of the conflict detection filter. This PR adds manifest evaluator-based filtering, skipping manifests that can't contain matching partitions. This should speed up commits.

Note that other code paths in the snapshot producer uses ManifestGroup, which already does this filtering. validateAddedDVs() was the only gap where we didn't filter.

@github-actions github-actions bot added the core label Mar 16, 2026
validateAddedDVs currently opens every delete manifest regardless of
the conflict detection filter. This PR adds manifest evaluator-based
filtering, skipping manifests that can't contain matching partitions.

Note that other code paths use ManifestGroup, which already does this
filtering. validateAddedDVs() was the only gap where we didn't filter.
@anoopj anoopj force-pushed the validation-partition-pruning branch from b145372 to 0caa787 Compare March 16, 2026 17:08
@nastra nastra requested a review from amogh-jahagirdar March 16, 2026 18:37
@anoopj anoopj changed the title Add manifest partition pruning to DV validation in MergingSnapshotProducer [core] Add manifest partition pruning to DV validation in MergingSnapshotProducer Mar 17, 2026
@hemanthboyina
Copy link
Contributor

in similar lines, In ManifestFilterManager#removeDanglingDeletesFor receives DataFiles which have partition info but only stores their paths as strings. This means ManifestFilterManager#canContainDroppedFiles returns true for every delete manifest since there's no partition info to filter against. The existing ManifestFilterManager#delete(F File) does this correctly, it stores both the file AND a PartitionSet. then uses this information in ManifestFileUtil#canContainAny to skip manifests whose partition range doesn't overlap.

Please let me know if I am missing something here, happy to contribute if this is a valid improvement.

@anoopj
Copy link
Contributor Author

anoopj commented Mar 17, 2026

@hemanthboyina

Your analysis above looks correct to me. This is an existing issue in ManifestFilterManager. T he fix would be straightforward: store a PartitionSet alongside removedDataFilePaths in removeDanglingDeletesFor, and then use ManifestFileUtil.canContainAny in the canContainDroppedFiles() code branch instead of returning true. Would be happy to review if you open a PR for it.

@hemanthboyina
Copy link
Contributor

raised #15671 for the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants